Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RFC] Decompose server class #8468

Draft
wants to merge 6 commits into
base: main
Choose a base branch
from

Conversation

fjetter
Copy link
Member

@fjetter fjetter commented Jan 19, 2024

I was thinking a little more about how #8430 could be used and ended up wanting to use this for the client as well.

(This is also loosely motivated by dask/dask-expr#765 where I suggested to move the key generation / tokenization from client to scheduler where the ordered sendrcv of #8430 would be nice)

The client unfortunately re-implements the RPC stream handler of the server base class making it hard to reuse the ordered sendrcv without a lot of code duplication.

This PR suggests a refactoring where I propose to move from inheritance to composition regarding the Server networking and RPC logic. The current Server base class is mixing a lot of application logic with networking logic which in the past frequently caused issues during teardown so separation of those concerns is long overdue.

The Server as proposed in this PR should likely rather be called RPCServer or RPCSomethingElse since it doesn't actually implement a traditional server<->client model but I'm trying to not get hung up on naming at this point.

With this decomposition I was then able to use the slimmed down Server in the client as well which opens up the reuse of smth like #8430
This refactoring is not really done but if we go down this road, the universally used ConnectionPool and stream_comms would basically blend into this single object which would support three ways to communicate

  1. One way communication, send only, ordered / multiplexed (what we currently do with BatchedSend)
  2. Two way send_recv, not ordered, dedicated channel (what is currently the PooledRPCCall)
  3. Two way send_decv, ordered, multiplexed (this is new)

State of this PR: Mostly working but still a couple of failing tests. From what I saw nothing insurmountable.

There are two commits

  • efb9045 which decomposes the Server class and refactors a couple of modules
  • 395fcc2 which applies this decomposition to the Client

I think I'm mostly interested in feedback about the first one since that may have merit without the introduction of new features

@hendrikmakait
Copy link
Member

Some general comments without diving too deep into the actual changes:
We should definitely aim to use orderd send-recv wherever possible, so also for connections between the client and the cluster instances.

+1 on refactoring more of our codebase away from inheritance to composition where possible/useful. IMO, this also makes testing easier.
+1 on reusing an "RPC server" to handle all our RPC logic, it doesn't make much sense for our Client to reimplement that stuff.

Copy link
Contributor

github-actions bot commented Jan 19, 2024

Unit Test Results

See test report for an extended history of previous test failures. This is useful for diagnosing flaky tests.

    29 files  ±    0      29 suites  ±0   14h 35m 48s ⏱️ + 3h 18m 41s
 4 083 tests ±    0   3 639 ✅  -   325    116 💤 +  4    326 ❌ +  319  2 🔥 +2 
50 647 runs   - 4 583  44 569 ✅  - 8 218  2 288 💤  - 145  3 781 ❌ +3 771  9 🔥 +9 

For more details on these failures and errors, see this check.

Results for commit f8ccddf. ± Comparison against base commit 8564dc7.

This pull request removes 10 and adds 10 tests. Note that renamed tests count towards both.
distributed.tests.test_core ‑ test_async_listener_stop
distributed.tests.test_core ‑ test_counters
distributed.tests.test_core ‑ test_server_assign_assign_enum_is_quiet
distributed.tests.test_core ‑ test_server_close_stops_gil_monitoring
distributed.tests.test_core ‑ test_server_status_compare_enum_is_quiet
distributed.tests.test_core ‑ test_server_status_is_always_enum
distributed.tests.test_core ‑ test_server_sys_path_local_directory_cleanup
distributed.tests.test_core ‑ test_thread_id
distributed.tests.test_core ‑ test_tick_logging
distributed.tests.test_core ‑ test_ticks
distributed.cli.tests.test_dask_worker.test_listen_address_ipv6[tcp:..[ ‑ 1]:---nanny]
distributed.cli.tests.test_dask_worker.test_listen_address_ipv6[tcp:..[ ‑ 1]:---no-nanny]
distributed.tests.test_node ‑ test_server_assign_assign_enum_is_quiet
distributed.tests.test_node ‑ test_server_close_stops_gil_monitoring
distributed.tests.test_node ‑ test_server_status_compare_enum_is_quiet
distributed.tests.test_node ‑ test_server_status_is_always_enum
distributed.tests.test_node ‑ test_server_sys_path_local_directory_cleanup
distributed.tests.test_node ‑ test_thread_id
distributed.tests.test_node ‑ test_tick_logging
distributed.tests.test_node ‑ test_ticks
This pull request skips 2 tests.
distributed.tests.test_metrics ‑ test_monotonic
distributed.tests.test_scheduler ‑ test_get_cluster_state_worker_error

♻️ This comment has been updated with latest results.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants